Deterministically Estimating Data Stream Frequencies

نویسنده

  • Sumit Ganguly
چکیده

We consider updates to an n-dimensional frequency vector of a data stream, that is, the vector f is updated coordinate-wise by means of insertions or deletions in any arbitrary order. A fundamental problem in this model is to recall the vector approximately, that is to return an estimate f̂ of f such that ∣f̂i − fi∣ < ∥f∥p, for every i = 1, 2, . . . , n, where is an accuracy parameter and p is the index of the lp norm used to calculate the norm ∥f∥p. This problem, denoted by ApproxFreqp( ), is fundamental in data stream processing and is used to solve a number of other problems, such as heavy hitters, approximating range queries and quantiles, approximate histograms, etc.. Suppressing poly-logarithmic factors in n and ∥f∥1, for p = 1 the problem is known to have ̃(1/ ) randomized space complexity [2, 4] and ̃(1/ ) deterministic space complexity[6, 7]. However, the deterministic space complexity of this problem for any value of p > 1 is not known. In this paper, we show that the deterministic space complexity of the problem ApproxFreqp( ) is ̃(n 2−2/p/ ) for 1 < p < 2, and (n) for p ≥ 2.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Aggregate Properties on Probabilistic Streams

The probabilistic-stream model was introduced by Jayram et al. [16]. It is a generalization of the data stream model that is suited to handling \probabilistic" data where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over potentially a very large number of classical \deterministic" streams...

متن کامل

Finding Frequent Items in Data Streams

We present a 1-pass algorithm for estimating the most frequent items in a data stream using very limited storage space. Our method relies on a novel data structure called a count sketch, which allows us to estimate the frequencies of all the items in the stream. Our algorithm achieves better space bounds than the previous best known algorithms for this problem for many natural distributions on ...

متن کامل

Processing Data-Stream Join Aggregates Using Skimmed Sketches

There is a growing interest in on-line algorithms for analyzing and querying data streams, that examine each stream element only once and have at their disposal, only a limited amount of memory. Providing (perhaps approximate) answers to aggregate queries over such streams is a crucial requirement for many application environments; examples include large IP network installations where performan...

متن کامل

تخمین دبی سیل با تناوب مختلف در حوضه آبخیز زاینده‌ رود طبق روش منطقه ای هیبرید

Designers of hydraulic structures are often faced with the problem of estimating flood frequencies at stream sites, where little or no flow information is available. A regional regression model is widely used which relates physical and climatological parameters to flow characteristics. In this study, a new method is used which is based on the station-year technique and combined records for seve...

متن کامل

تخمین دبی سیل با تناوب مختلف در حوضه آبخیز زاینده‌ رود طبق روش منطقه ای هیبرید

Designers of hydraulic structures are often faced with the problem of estimating flood frequencies at stream sites, where little or no flow information is available. A regional regression model is widely used which relates physical and climatological parameters to flow characteristics. In this study, a new method is used which is based on the station-year technique and combined records for seve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009